Minimum word error training of RNN-based voice activity detection

نویسندگان

  • Gregory Gelly
  • Jean-Luc Gauvain
چکیده

Voice Activity Detection (VAD) is critical in speech recognition systems as it can dramatically impact the recognition accuracy especially on noisy data. This paper presents a novel method which applies Minimum Word Error (MWE) training to a Long Short-Term Memory RNN to optimize Voice Activity Detection for speech recognition. Experiments compare speech recognition WERs using RNN VAD with other commonly used VAD methods for two corpora: the conversational Vietnamese corpus used in the NIST OpenKWS13 evaluation and a corpus of French telephone conversations. The proposed VAD method combining MWE training with RNN yields the best ASR results. This MWE training scheme appears to be particularly useful for low resource ASR tasks, as exemplified by the IARPA BABEL data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Non-linear estimation of voice activ recognition of nois

Feed-forward multi-layer perceptrons (MLP) and recurrent neural networks (RNN) fed with different sets of acoustic features are proposed for computing the presence and absence of speech in continuous speech signal in presence of various levels of background noise. Detailed performance evaluations on voice activity detection (VAD) are reported using the Aurora2, Aurora3 and TIMIT corpora. It is ...

متن کامل

A statistical model-based voice activity detection employing minimum classification error technique

In this paper, we apply a discriminative weight training to a statistical model-based voice activity detection (VAD). In our approach, the VAD decision rule is expressed as the geometric mean of optimally weighted likelihood ratios (LRs) based on a minimum classification error (MCE) method. That approach is different from that of previous works in that different weights are assigned to each fre...

متن کامل

CHiME4: Multichannel Enhancement Using Beamforming Driven by DNN-based Voice Activity Detection

In this work, we focus on methods for enhancing the sixchannel CHiME4 data using beamforming that is driven by voice activity detectors (VAD). We propose two beamformers and two VADs that are based on trained deep neural networks (DNN). Their combinations are compared when used as frontends whose outputs are forwarded to the baseline automatic speech recognition system. Results in term of Word-...

متن کامل

Voice Activity Detection based on Optima Multiple Featu

This paper presents a voice activity detection (VAD) scheme that is robust against noise, based on an optimally weighted combination of features. The scheme uses a weighted combination of four conventional VAD features: amplitude level, zero crossing rate, spectral information, and Gaussian mixture model likelihood. This combination in effect selects the optimal method depending on the noise co...

متن کامل

Voice Activity Detection Based on Discriminative Weight Training Incorporating a Spectral Flatness Measure

In this paper, we present an approach to incorporate discriminative weight training into a statistical model-based voice activity detection (VAD) method. In our approach, the VAD decision rule is derived from the optimally weighted likelihood ratios (LRs) using a minimum classification error (MCE) method. An adaptive online means of selecting two kinds of weights based on a power spectral flatn...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015